Normal distribution#
SAT scores follow a nearly normal distribution with a mean of 1500 points and a standard deviation of 300 points.
ACT scores also follow a nearly normal distribution with mean of 21 points and a standard deviation of 5 points.
Nel scored 1800 points on their SAT
Sian scored 24 points on their ACT.
Who performed better?
Setup#
import pandas as pd
import numpy as np
import scipy.stats as st
import altair as alt
alt.data_transformers.disable_max_rows()
DataTransformerRegistry.enable('default')
Data#
Generate data#
sat_mean = 1500
sat_sd = 300
act_mean = 21
act_sd = 5
nel_sat = 1800
sian_act = 24
np.random.seed(0)
sat = np.random.normal(sat_mean, sat_sd, 100000)
act = np.random.normal(act_mean, act_sd, 100000)
df = pd.DataFrame({"sat": sat, "act":act})
Data overview#
df.head()
| sat | act | |
|---|---|---|
| 0 | 2029.215704 | 18.581013 |
| 1 | 1620.047163 | 27.440285 |
| 2 | 1793.621395 | 20.350606 |
| 3 | 2172.267960 | 20.009608 |
| 4 | 2060.267397 | 19.327562 |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sat 100000 non-null float64
1 act 100000 non-null float64
dtypes: float64(2)
memory usage: 1.5 MB
Data visualization#
chart = alt.Chart(df).transform_density(
'sat',
as_=['sat', 'density'],
).mark_area().encode(
x="sat:Q",
y='density:Q',
)
nel = alt.Chart(pd.DataFrame({
'value': nel_sat,
'color': ['orange']
})
).mark_rule(
strokeDash=[5, 5],
strokeWidth=3
).encode(
x='value:Q',
color=alt.Color('color:N', scale=None)
)
chart + nel
chart = alt.Chart(df).transform_density(
'act',
as_=['act', 'density'],
).mark_area().encode(
x="act:Q",
y='density:Q',
)
sian = alt.Chart(pd.DataFrame({
'value': sian_act,
'color': ['orange']
})
).mark_rule(
strokeDash=[5, 5],
strokeWidth=3
).encode(
x='value:Q',
color=alt.Color('color:N', scale=None)
)
chart + sian
Standardizing with Z scores#
z_nel = (nel_sat - sat_mean) / sat_sd
z_nel
1.0
z_sian = (sian_act - act_mean) / act_sd
z_sian
0.6
Normal probability calculations#
Nel’s percentile is the percentage of people who earned a lower SAT score than Nel.
nel_percentile = st.norm.cdf(z_nel)
print(f"Nel’s percentile: {nel_percentile:.2}")
Nel’s percentile: 0.84
Determine the proportion of SAT test takers who scored better than Nel on the SAT.
sian_percentile = st.norm.cdf(z_sian)
round(sian_percentile, 2)
0.73
We can also find the Z score associated with a percentile. For example, to identify Z for the 80th percentile, we use st.norm.ppf() which identifies the quantile for a given percentage.
st.norm.ppf(.80)
0.8416212335729143